ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000100_nls@cse.iitb.ernet.in _Sat May 1 21:48:00 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 6KB

Received: from relay2.UU.NET by optima.CS.Arizona.EDU (5.65c/15) via SMTP id AA21109; Sat, 1 May 1993 21:48:00 MST Received: from spool.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA23076; Sun, 2 May 93 00:48:09 -0400 Received: from sangam.UUCP by spool.uu.net with UUCP/RMAIL (queueing-rmail) id 004726.20115; Sun, 2 May 1993 00:47:26 EDT Received: by sangam.ncst.ernet.in (4.1/SMI-4.1-MHS-7.0) id AA11843; Sun, 2 May 93 10:08:15+0530 Received: from kailash.cse.iitb.ernet.in by iitb.ernet.in SENDMAIL Version (4.1/SMI-4.1-MHS-7.0) id AA22512; Sun, 2 May 93 10:00:42+0530 Received: by kailash.cse.iitb.ernet.in (4.1/SMI-4.1) id AA03479; Sun, 2 May 93 10:02:07 IST Date: Sun, 2 May 93 10:02:07 IST From: nls@cse.iitb.ernet.in (N L Sarda) Message-Id: <9305020432.AA03479@kailash.cse.iitb.ernet.in> To: tsql@cs.arizona.edu BENCHMARK QUERIES : ON THEIR CLASSIFICATION In our efforts to define classes of queries to be used as benchmarks, I felt that the classification can also be worked out from user's point of view (in addition to the classification done by CSJensen based on SQL format) so that certain types, expected to be more frequent than others, can be emphasized. The following is an exploration in this direction. Our target database contains historical data of one/more sets of entities. An entity has many facts stored about it in the database. A fact is true over some valid-time interval. There is one 'current' fact and the others are history (past facts). A real world entity may be 'in and out' of our database at various times (eg., an employee being fired and re-hired). Thus, it exists sometimes and does not at other. Our retrieval may obtain data from one entity set or from multiple entity sets. A particular case of interest in multiple entity set query is when the entities existed concurrently. Our retrieval might ask for full facts stored in database, or parts of facts with coalesceing and/or time-slicing. Latter limits time values in result to the time-slice boundaries. The focus of retrieval may be only the current data, only the past data, or the historical data (current + past). On the other hand, user may want to obtain aggregated results. The retrieval may be constrained by a predicate on non-temporal as well as temporal attributes. We may focus only on the latter for defining taxonomy for our benchmark queries. A large number of time domain operators (or, functions) can be identified for constructing temporal predicates. Various query languages may differ with respect to what operators are included by them. However, since languages can be easily enriched with more of these operators/functions, it may not be necessary to define query classes based on these operators. We now come to formally stating our way of query classes. We use COBOL type notation for this : square brackets give an option and braces define a choice. Words outside brackets/braces are for readability. (The ugly representation of large braces may please be pardoned.) ------------------------------------------------------------ A query class is [CONCURRENT] [TIME-SLICED] [COALESCED] | CURRENT | | PAST | { HISTORICAL } | AGGREGATED | | | retrieval [ based on | EXISTANCE | { } of | NON-EXISTANCE | [AGGREGATED] relationship | IN / WITH | { } | AT ALL | | INSTANT | { INTERVAL } | ELEMENT | | DURATION | specified by | user | { } | computed from | | other data | ] ---------------------------------------------------------- Let us mention below some example classes obtained from above categorization of query classes : 1. current retrieval (based on non-temporal predicate) 2. historical retrieval based on existance relationship in (an) interval specified by user. 3. time-sliced coalesced historical retrieval based on existance relationship at all (instants in the) interval given (a) by user, and (b) computed (possibly using another retrieval). 4. concurrent historical retrieval based on existance relationship with an interval. 5. past retrieval based on non-existance relationship at all (instants in an) interval given by user. The above definition also leads to a fairly large number of query classes. It may be desirable to cut down the number by not considering every possible combination. The classification above can be easily related with the taxonomy proposed by Jensen as follows : a) concurrent multi-entity retrieval cooresponds to imposed interval valid-time component in output and containment-based operator on intervals of participating entities (ie., relations) in selection based taxonomy. There are many other ways of relating entities other than their concurrent existance, but concurrent will be more commonly required. The other categories need to be explicitely 'programmed' using suitable operators in selection and computations in output. b) time-slicing can also be expressed using interval derivation in output and containment operator in selection. c) there is no easy way of specifying coalescing without complicated grouping and computations. The situation is similar to some extent with 'distict' in SQL, where duplicates are not aotomatically eliminated. We expect that coalescing may also not be done automatically in temporal SQLs. d) queries on current and past data are straightforward to express in Jensen taxonomy also, but are mentioned explicitely here as a class because of their importance (likely to be more frequent). e) the word 'relationship' in the above class definition format represents various possible time domain operators (including the duration, ordering and containment based operators defined in Jensen taxonomy). f) the queries based on 'non-existance' are emphasized as they are easy to state in English, but difficult to formulate in SQL (usually need nested queries). Comments/questions are welcome. Nandlal Sarda